In the 21st century, cars are an important mode of transportation that provides us the opportunity for personal control and autonomy. In day-to-day life, people use cars for commuting to work, shopping, visiting family and friends, etc. Research shows that more than 76% of people prevent themselves from traveling somewhere if they don't have a car. Most people tend to buy different types of cars based on their day-to-day necessities and preferences. So, it is essential for automobile companies to analyze the preference of their customers before launching a car model into the market. Austo, a UK-based automobile company aspires to grow its business into the US market after successfully establishing its footprints in the European market.

In order to be familiar with the types of cars preferred by the customers and factors influencing the car purchase behavior in the US market, Austo has contracted a consulting firm. Based on various market surveys, the consulting firm has created a dataset of 3 major types of cars that are extensively used across the US market. They have collected various details of the car owners which can be analyzed to understand the automobile market of the US.

Objective

Austo’s management team wants to understand the demand of the buyers and trends in the US market. They want to build customer profiles based on the analysis to identify new purchase opportunities so that they can manipulate the business strategy and production to meet certain demand levels. Further, the analysis will be a good way for management to understand the dynamics of a new market. Suppose you are a Data Scientist working at the consulting firm that has been contracted by Austo. You are given the task to create buyer’s profiles for different types of cars with the available data as well as a set of recommendations for Austo. Perform the data analysis to generate useful insights that will help the automobile company to grow its business.

Data Description

austo_automobile.csv: The dataset contains buyer’s data corresponding to different types of products(cars).

Data Dictionary

Age: Age of the customer Gender: Gender of the customer Profession: Indicates whether the customer is a salaried or business person Marital_status: Marital status of the customer Education: Refers to the highest level of education completed by the customer No_of_dependents: Number of dependents(partner/children/spouse) of the customer Personal_loan: Indicates whether the customer availed a personal loan or not House_loan: Indicates whether the customer availed house loan or not Partner_working: Indicates whether the customer's partner is working or not Salary: Annual Salary of the customer Partner_salary: Annual Salary of the customer's partner Total_salary: Annual household income (Salary + Partner_salary) of the customer's family Price: Price of the car Make: Car type (Hatchback/Sedan/SUV)

Observations 1.Age range for the data set varies. 2.Profession in the data set is only salaried and business. 3.Some have partners that are unemployed.

Comment

The data has 1581 rows and 14 columns.

Observations

All column have 1,581 each. The data contians intger and object data-type.

Observations

1.Marrried, Educated Male that does not own a house bought most of the cars. 2.Majority of the car bought was hatcback 3.The median Age, No of dependents, salary, partner-salary, total salary, and price is 29, 2, 59,000, 25,000, 78,000, 31,000 respectively. 4.Max age is 60 and minimum 22 5.Max total salary is 158, 000 and minimum 30,000 6.The average price for a car is between 31,000 - 35,000.

No missing value in the data set

Let's check the count of each unique category in each of the categorical variables.

Observation

1.Maximum age is 60, minimum is 22

2.The age distribution is skewened to the right

3.Majority of the age is middle aged

4.Median age is equal 29 but the mean is ~32

5.There are outliers in this variable.

6.Age 60 is buying the expensive car

Observations

1.Salary and Total-salary does not have any outliers.

2.The distribution for Salary and Total-salary is close to normal, suggesting possible correlation between the two variables.

3.There is no outlier for all of the salary variables except total salary.

4.Partner-salary is skewed to the right, Also there is possible correlation between Partner-salary and dependents because of the visual proportion.

Observation

1.The highest price for car is 80,000 and the minimum 18,,000.

2.The skewness follows the proportion of the partner salary iNdicating correlation.

Let's explore the categorical variables now¶

Observations

1.Male is 79.2% of the sample population

2.Female is 20.8% of the sample population

Observations

1.Salaried is 56.7% of the sample population

2.Female is 43.3% of the sample population

Observations

1.Married has the highest popuolation with 91.3%.

2.Single population is 8.7%

Observations

1.Hatchback has the highest count with 884 (55.9%) of the sample population

2.Sedan is 460(29.1%)

3.SUV is 237(15.0%)

# Observations 1.As expected,Age shows high correlation with Price 2.No of dependents of course would be negatively correlated with Price because number of dependents is low in single households. 3.It is important to note that correlation does not imply causation. 4.There does not seem to be a strong relationship between Total-salary and Price.

A partcularly interesting relationship between Age, Income and Make of car can be seen in this graph

Comment

1.There is a clear difference the prices of the each Make.

2.SUV is the most expensive car

3.Hatcback is the least expensive

In the dataset, approximately 84% (1332 / 1581) of the Total salary have 2 and more dependents .

Observation

1.Suv is the least car bought by total_salary,

2.This indicate household with two salary prefer Hatchback and Sedan

3.indicating more than one car in household, Since both husband and wife are working.